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BIASED RANDOM-TO-TOP SHUFFLING 

By Johan Jonasson 1 

Chalmers University of Technology 

Recently Wilson [Ann. Appl. Probab. 14 (2004) 274-325] intro- 
duced an important new technique for lower bounding the mixing 
time of a Markov chain. In this paper we extend Wilson's technique 
to find lower bounds of the correct order for card shuffling Markov 
chains where at each time step a random card is picked and put at 
the top of the deck. Two classes of such shuffles are addressed, one 
where the probability that a given card is picked at a given time step 
depends on its identity, the so-called move-to-front scheme, and one 
where it depends on its position. 

For the move-to-front scheme, a test function that is a combina- 
tion of several different eigenvectors of the transition matrix is used. 
A general method for finding and using such a test function, under a 
natural negative dependence condition, is introduced. It is shown that 
the correct order of the mixing time is given by the biased coupon col- 
lector's problem corresponding to the move-to-front scheme at hand. 

For the second class, a version of Wilson's technique for complex- 
valued eigenvalues/eigenvectors is used. Such variants were presented 
in [Random Walks and Geometry (2004) 515-532] and [Electron. 
Comm. Probab. 8 (2003) 77-85]. Here we present another such vari- 
ant which seems to be the most natural one for this particular class 
of problems. To find the eigenvalues for the general case of the second 
class of problems is difficult, so we restrict attention to two special 
cases. In the first case the card that is moved to the top is picked 
uniformly at random from the bottom k = k(n) = o(n) cards, and 
we find the lower bound (n 3 / (4-7T 2 k(k — 1))) logn. Via a coupling, an 
upper bound exceeding this by only a factor 4 is found. This gener- 
alizes Wilson's [Electron. Comm. Probab. 8 (2003) 77-85] result on 
the Rudvalis shuffle and Goel's [Ann. Appl. Probab. 16 (2006) 30-55] 
result on top-to-bottom shuffles. In the second case the card moved 
to the top is, with probability 1/2, the bottom card and with proba- 
bility 1/2, the card at position n — k. Here the lower bound is again 
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of order (n 3 /fc 2 )logn, but in this case this does not seem to be tight 
unless k — 0(1). What the correct order of mixing is in this case is 
an open question. We show that when k = n/2, it is at least 6>(n 2 ). 

1. Introduction. How many steps does a Markov chain need to get close 
to stationarity? This question has attracted a great amount of interest during 
the last few decades, partly because computer development has supplied the 
possibility of powerful MCMC simulation techniques. 

Much interest has been focused on card shuffling chains, that is, chains 
whose state space is the symmetric group, S n . During the 1980s and early 
1990s a lot of progress was made and good upper and lower bounds, and often 
cutoffs, were found for a great variety of different card shuffling techniques. 
Among these, the Bayer-Diaconis | log 2 n cutoff for the riffle shuffle (see [2]) 
is the most celebrated result. In the mid-90s the subject fell into a relative 
silence as the available techniques did not suffice to solve the remaining 
unsolved problems. 

However, quite recently the subject was revitalized by Wilson's [12] in- 
troduction of a powerful new technique to lower bound the mixing time. 
The idea is to find an easily expressed (right) eigenvector of the transition 
matrix, corresponding to an eigenvalue close to 1, and use this eigenvec- 
tor to construct a test function. In [12] Wilson established essentially tight 
(i.e., tight up to a constant) lower bounds for neighbor-transposing shuf- 
fling and also for so-called lozenge tilings. A variant of Wilson's technique 
for complex- valued eigenvalues/eigenvectors was used by Mossel, Peres and 
Sinclair [8] to establish a tight lower bound for the cyclic-to-random shuffle. 
Wilson himself used another such variant to establish the 0(n 3 logn) mixing 
time for the Rudvalis shuffle. (This result was generalized by Goel [4] who 
considered random-to-bottom shuffling in general.) Jonasson [6] developed 
a version of Wilson's technique where the test function is only needed to 
be constructed from something sufficiently close to an eigenvector, thereby 
being able to establish the G(n 2 logn) mixing time for the overhand shuffle. 

In the present paper this development is continued. We consider random- 
to-top shuffles, that is, card shuffling chains where every transition is such 
that some card is moved from its present position to the top of the deck 
without affecting the relative positions of the other cards. The shuffles will 
be biased, that is, such that the card taken to the top is in general not 
chosen uniformly at random from all the cards. We consider two classes of 
biased random-to-top shuffles: 

(1) Each card, k, is once and for all given a probability p^ and the card 
taken to the top is chosen according to these probabilities, regardless of the 
present order of the deck. 

(2) Same as (1), except that the probability pk is assigned to position k 
in the deck. 
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The case described by (1) is often referred to as the move-to-front scheme. 
It will always be assumed (without loss of generality) that all the pk's are 
positive. The move-to-front scheme clearly describes an irreducible aperiodic 
Markov chain, so it converges to its stationary distribution whose nature is 
such that the probability of having card Cj in position j, j £ [n], equals 

Pc 2 Pc; Pc n 

Pci i- Pei i- Pci - P c 2 1-E-iVc/ 

see, for example, [9]. To the best of my knowledge, convergence rates for 
the move-to-front scheme have only been studied for the case where all 
the p^s are equal; the ordinary random-to-top shuffle for which one has 
a cutoff at nlogn shuffles; see [1]. Rodrigues [9] studied a variant of the 
move-to-front scheme where the identities of the cards moved to top at 
different shuffles are dependent. For the ordinary top-to-random shuffle, the 
mixing time is given by the classical coupon-collector's problem. In Section 3 
we show that the corresponding biased coupon-collector's problem in the 
general case gives the correct answer up to a constant factor. Our lower 
bounds are established via an extension of Wilson's technique where several 
different eigenvalues/eigenvectors are used to construct the test function. 

Case (2) with p n -\ = p n = 1/2 is known as the Rudvalis shuffle. Therefore, 
we refer to case (2) from now on as generalized Rudvalis (or GR) shuffles. 
The special case when, for some k < n, the card moved to the top is chosen 
uniformly among the bottom k = k(n) cards is, in the terminology of Goel 
[4], referred to as the bottom-to-top shuffle. When k = 0(1), it is known that 
the mixing time is of order n 3 logn; for the Rudvalis shuffle, see [5] and [11] 
and for the general case, see [4]. Goel also estimated the mixing time for 
other values of k: When k = O(n), it is shown that k is at least of order 
n log n and at most of order n 2 log n and that when k is sufficiently close 
to n, the correct order is nlogn. In Section 4, which is devoted to GR 
shuffles, we improve the previous results on the bottom-to-top shuffles. For 
k = o(n), we find that the correct order of the mixing time is (n 3 /fc 2 ) logn 
and find upper and lower bounds that only differ by a factor 4. In the 
case k = 0(n) it is shown that the mixing time is (nlogn). The upper 
bounds are found via coupling and to estimate the coupling time, we use 
the trick of imagining the top of the deck as shifting cyclically one step 
down the deck for every time step. Then if a card is only observed at times 
it is touched, its motion is described by a random walk whose step size 
has mean and variance of order k 2 . This can then be used together with 
the nature of the coupling to conclude the time to get a particular card 
matched is roughly bounded by n times the time taken for a random walk 
of this type started at n/2 to exit the interval [0,n]. The lower bounds are 
found using a version of Wilson's technique designed to handle complex- 
valued eigenvalues /eigenvectors. Wilson [11] and Saloff-Coste [10] designed 
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other such variants. They also pointed out that if one tries to apply Wilson's 
original lemma in complex- valued cases without further thought, one often 
ends up with results that are much too weak to be of interest. Our version 
is the perhaps most straightforward adaption of Wilson's technique to the 
complex- valued case and it does not need the chain under study to be lifted 
to a larger state space. It works well for the present applications and we 
believe that it is fairly generally applicable, but it would, for example, not 
solve all problems in [11]. 

The lower bound technique, in principle, works for GR shuffles in general. 
However, in practice, it is very difficult to find the eigenvalues, even for the 
motion of a single card, but in the simplest cases. In addition to the bottom- 
to-top shuffle, we also carry out the calculations for the cases p n -k =p n = 
1/2 for different /c's and find a lower bound which is almost identical to 
the one for the bottom-to-top shuffle with the same k. However, unlike the 
bottom-to-top case, this lower bound does not seem to be of the correct order 
in this case, unless k = 0(1). We show that when k = n/2, the mixing time 
is at least 0(n 2 ). In fact, single-card motion takes 0(n 2 ) steps to mix, so 
this is an example of a situation where the second eigenvalue for single-card 
motion does not capture the correct order of the mixing time, not even for 
the single-card chain itself. Unfortunately we have not been able to produce 
any good upper bound, and it is a wide open question what the order of 
mixing really is. 

The next section gives the necessary preliminaries. 

2. Preliminaries. 

2.1. Basic definitions. The most common way to measure the distance 
between two probability measures \x and i/ona finite set S is by the total 
variation norm given by 

- v \\ : = \ Yl K s ) ~ ^( s )l = ™§x(/x(A) - v{A)). 

If {Xt}^ is an aperiodic irreducible Markov chain on state space S and 
with stationary distribution tt, started from a fixed state s, then its mixing 
time, T m ; x , is defined via 

t(s) :=mm{t:\\P(X t e-)-ir\\<l} 

and 

T mix :=maxr(s). 

The mixing time is often expressed in terms of some measure of the size 
of the state space. When doing so one implicitly considers a sequence of 
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Markov chains {X™}, n = 1,2,3, ... , on state spaces S n and with stationary 
distributions 7r n , where the state spaces are such that |<S n | | oo in some 
natural way. In our case S n = S n , the symmetric group on n cards, and the 
mixing time is expressed in relation to the number of cards, n, as n — > oo. 
The sequence of Markov chains is said to have a cutoff at T m [ x := r™ ix if, for 
every a > 0, 

n lim ||P(^ 1+a)Tmix G 0-^11=0 

and 

lim \\P(X r A a)T . G .)-7r1 = l. 

2.2. Coupling. A common technique to find upper bounds on the mixing 
time is coupling: Suppose {Xt} is the Markov chain under study, started 
from a fixed state s maximizing r(s), and that {Yt} is a chain with the same 
transition rule, but started from stationarity. Suppose also that the updates 
of the two chains have been set up, or coupled, in such a way that as soon 
as Xt = Yf Q for some to > then Xt = Yt for all i > to • Put 

T:=mm{t:Xt = Y t }. 

Then T is called the coupling time and the coupling inequality (see, e.g., [7], 
Section 1.2) states that, for all t, 

\\P(X t €-)-n\\<P{T>t). 

The coupling inequality follows from the definition of total variation norm 
and the simple observation that, on the event {T < t}, X t and Y t are the 
same. 

2.3. Wilson's technique. Here we state and prove a basic version of Wil- 
son's technique that will later develop in a few different ways. Let {A^j^Q be 
an irreducible aperiodic Markov chain on state space S and with stationary 
distribution it, starting from a fixed state sq. Let $ : S — > C and 7 E (0, 1/2) 
be such that, for every s £ S, 

E[^(X t+1 )\X t = s] = (l- 1 MX t ), 

that is, 1 — 7 is an eigenvalue for the transition matrix of the chain and $ 
is a corresponding eigenvector. Let R be a number such that 

R > maxE[|$(X m ) - <S>{X t )\ 2 \X t = s). 

Theorem 2.1. Fix a > and let 

ri _ log|$(so)|-(l/2)log4i?/( 7 a) 
-log(l-7) 
Then \\P(X t G •) - tt|| > 1 - a for all t < T. 
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Proof. By induction, E$(X t ) = (1 - t)*$Oo)- Therefore, E^X^) = 
0, with Xoq denoting a state chosen from the stationary distribution. Put 
A$ = &(Xt+i) - &(X t ). We have that 

E[|$(A m )| 2 |A t ]=E[|$(X t+1 ) + A$| 2 |X t ] 

= (l-2 7 )|$(A t )| 2 +E[|A$| 2 |X t ] 
<(l-2 7 )|$(X i )| 2 + i?. 
By induction using that 7 < 1/2, 

E|$(X t )| 2 <(l-2 7 )*|cD( So )| 2 + ^. 



2 7 



Thus, again using that 7 < 1/2, 

Var$(X 4 ) =E|^(X t )| 2 - |E$(A t )| 2 



((l-2 7 ) i -(l- 7 ) 2t )l^o)| 2 + ^<^. 



By Chebyshev's inequality, 



p(mx t )-E$(x t )\>J— ) <^ 

\ V 7« / 2 



and, on letting i — > 00, 



a 



P|W^)l>,/-l< 2 



Thus, if t is such that |E$(X t )| > 2 1/^/(70), we have ||P(X t G •) — vr|| > 
1 — o. This, however, is the case precisely when t <T. □ 

3. The move-to-front scheme. Recall the move-to-front scheme. A deck 
of n cards is shuffled by the following rule: To each card k, k G [n], we attach 
once and for all a probability p^. We assume without loss of generality that 
Pk > for every k and that p\ > P2 > • • • > p n - We will also for technical 
reasons work under the mild condition that p/, < 1/3 for every A;. At each 
time step a card is chosen according to these probabilities and then moved 
to the top of the deck without altering the relative positions among the 
other cards. Let {Xf}^ denote the Markov chain on the symmetric group 
S n defined in this way, started from a permutation Xq = sq maximizing the 
time taken to reach stationarity. 
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3.1. Upper bound. To find an upper bound on the mixing time, we use 
coupling. For this we need another deck {Y{\ started from stationarity to 
be updated according to the same transition rule. The coupling is given by 
the following simple rule: At each time step let the same card be moved to 
the top for the two decks. Let T = min{i : Xt = Y{\ be the coupling time. 
Considering only one of the decks, a moment's thought reveals that at the 
first time that all but one card has at least once been picked to the top, 
the order of the cards no longer depends on their starting positions, but 
can be read off completely from in what order they have been moved to 
the top. More precisely, the later a card has been picked, the higher up 
in the deck it is, with the still untouched card at the very bottom of the 
deck. Now considering again both decks, this means that at this time the 
two decks must agree. Therefore, the problem can be solved by solving the 
biased coupon collector's problem naturally appearing from the different 
card probabilities: 



l fc=i ) 

the coupling inequality tells us that r m i x < t u . Our next task is to prove that 
t u is, up to a constant, the correct mixing time. 

3.2. Lower bound. Because of the fact that different cards are picked 
with different probabilities, it is difficult to find eigenvectors for single eigen- 
values that, evaluated at Xt, are sufficiently concentrated around their mean 
to provide a good enough test function for sharp lower bounds. We shall cir- 
cumvent this problem by combining eigenvectors corresponding to several 
different eigenvalues. For a general treatment of the idea, we assume that 
the setting is the same as in Section 2.3, but that (1 — jj, <F,), j = 1, . . . , m, 
are possibly different eigenvalue/eigenvector pairs for the transition matrix. 
For simplicity, assume that the <£j's are scaled in such a way that, for all j, 
max sg 5E[|A$j| 2 |Xf = s] < jj. [The scaling does not in any way affect the 
lower bound that comes out in the end, what matters is the relation between 
the variance of &j(Xt) and <3?j(so)-] Assume also that, for every j, k and t, 






in 



<t>(X t ) = <f>\Xt) :=J2^j(Xt) 



J. JONASSON 



where the a,j = a>j(t), j G [m], form a unit vector, that is, X)j=i a j = 1- 
The precise choice of the Oj's should be made in such a way that the 
lower bound achieved is optimized. Reasoning as in the proof of Theo- 
rem 2.1, Var$j-(Xt) < 1/2. Thus, by the negative covariance assumption, 
Vax$(X t ) < 1/2 and by letting t -> oo, Var^A^) < 1/2. Note also that 
E$(X t ) = E^Li«j(l - 7i)**i(»o) and E^A^) = 0. Continuing along the 
lines of the proof of Theorem 2.1 now reveals that ||P(.Xt G •) — vr|| > 1/3 as 
long astis such that |E$(X t )| >V6, that is, if | J2f=i a ji l -7i)* $ j( s o)l > 
for an optimal choice of the dj 's. 

Let us now move back to the move-to-front scheme. Let the deck start 
with card n in position 1, card n — 1 in position 2, . . . , card 1 in position n 
and denote this state as before by sq. Assume for simplicity that n is even. 
For every odd j £ [n], define 



— ^ , if A7,(j + l)<A7(j), 

Pj+Pj+i 

i{Xt (j)<X t (j + l). 

Pj +PJ+1 



In words, <£j is assigned the positive value when card j + 1 is above card 
j in the deck and the negative value when the opposite situation occurs. 
Then <&j is an eigenvector for the eigenvalue 1 — jj, where jj -=Pj +pj+\. 
We have that E[| A<I>j| 2 |X t ] <pj < jj so the general situation just described 
applies if it can be shown that the $j's are negatively correlated. However, 
the simplicity of the situation allows for a direct calculation of the covariance 
of ^(Xt) and $ k (X t ),j^k: 

\tPj Pk 



and 



E[$ i (X t )$ fe (X t )] = (l-7i-7fc) t 



E[* i (X t )] = (l-7i)*^. 

'3 



SO 

Cov(*i(*t), * k {X t )) = ((1 - 7j - Ik)' - (1 - 7i)'(l " -Wt)')^ ^ °- 

Since <frj(so) — Pj/lj > 1/2, it follows from above that the variation distance 
from stationarity is at least 1/3 when t is such that J2j-j odd a j(l — 7j) — 
2\/6. In order to make this bound on t as good as possible, we pick the dj's 
so that the left-hand side is maximized, that is, aj = (1 — 7y)*/(X)fc:fe odd(l — 
7fc) 2 *) 1 / 2 . Doing so we get that variation distance is at least 1/3 for t such 
that 

£ (l-7,) 2 *>24. 

J : J odd 
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In order to put this in relation to t u above, recall that all pj's are assumed 
not to exceed 1/3. Therefore, (1 - -yj) 2t > (1 - 2p j ) 2t > (1 -pj) 6t . Note also 
that odd(l ~ Pj) et > |Sj=i(l ~Pj) St - Therefore, variation distance is 

at least 1/3 as long as 

n-1 

£(i-p;) 6t >48. 

Put To for the largest t for which this inequality holds. Now in case to < 
log(3/4)/log(l — p n -i), then (1 — p n _i) T ° > 3/4. In this case To is not a good 
lower bound and a better one is given by the trivial one t\ := max{i : (1 — 
Pn-if > 3/4}. (Trivial because the probability at stationarity of having card 
n — 1 above cardp n is at least 1/2 and it is less than 1/4 at time r\.) However, 
since EpJ (1 ~ Pj) 6(ri+1) < 48 and (1 - Pj) T1+1 < 3/4 for all j<n-l, 

n-1 

E(i-^) 25(T1+1) <48-(|) 19 <i 

i=i 

Hence, t u < 25(ti + 1) and so the biased coupon collector's problem yields 
the correct mixing time up to a factor of 25. Assume now that (1 — p n _i) T(J < 
3/4 so that (1 - pj) T0+1 < 3/4 for all j. Then 

(1 _p.)«(^+l) < (|)19 (1 _ pi) 6(ro + l) < _1_ (1 _ p .)6( 70+ l) 

and so 

n— 1 n— 1 

E(i-^ 25( - +1) <ikE(i-^) 6(T0+1) <i- 

Thus, t u < 25(to + 1). The following theorem summarizes the results of this 
section. 



Theorem 3.1. Consider the move-to-front scheme with card probabili- 
ties pj, j £ [n], with pj < 1/3 for every j. Put 

Tu-mm^J^l-pjfK^j. 

Then 

25 7« 1 — T m i x ^ T u . 

Examples, (a) The ordinary random-to-top shuffle. Here pj = 1/n for 
every j. We get t u = (1 + o(l))nlogn and ^nlogn < r m - lx < nlogn. Of 
course, it is well known that nlogn is a cutoff in this case. 
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(b) Put 



n 

i<i< o> 



, n+l ~ ~ 2 

Pi = 1 2 n 

n[n + \) 2 

Here t u = (1 + o(l))^n 2 logn, so the correct order of the mixing time is 
n 2 logn. 

(c) Let pj = j / (z~2k=i k -1 )- Then with t = cn(logn) 2 , it is readily seen 
that 

when c < 1 , 



when c > 1. 



Therefore, t u = (1 + o(l))n(logn) 2 and the order of the mixing time is 
re(logra) 2 . 

(d) Put pj = 2(n + 1 - j)/(n(n + 1)). With i = cn 2 , 

n-l n-1 2c i 

if, say, c > 1. Therefore, n 2 is the correct order of the mixing time. This 
is thus a case where the time taken to reach stationarity depends almost 
entirely on the time taken to find the few most unprobable cards. 



4. Generalized Rudvalis shuffles. 



4.1. Lower bounds. The eigenvalues for the GR shuffles that are at least 
reasonably accessible are those for the motion of a single card. These typ- 
ically turn out to be complex. Now taking a closer look at the proof of 
Theorem 2.1 reveals that there is nothing in it that technically prevents 
A := 1 — 7 from being complex- valued. However, the typical situation is such 
that 7 is of much larger order than 1 — |A|, but it is the latter that indicates 
the correct mixing time. Therefore, one would like a variant of Theorem 2.1, 
where one in effect works with 1 — |A| rather than 7. Wilson [11] developed 
such a method in order to take care of the original Rudvalis shuffle and some 
variants of it. One ingredient in his method is to extend the Markov chain 
to a larger state space by incorporating time into it. In our case such an 
extension seems unnecessary and the following variant is the most natural 
way to attack the problems at hand: 

Let the setting be as in Theorem 2.1, with the exception that the eigen- 
value is now (1 — j)e ld for some 7 G (0, 1/2] and some 6 G [0, it] and R is such 
that 

R > maxE[|e-^(X m ) - ®{X t )\ 2 \X t = s\. 
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Theorem 4.1. Assume the setting above, pick a > and let T be as in 
Theorem 2.1. Then for t <T, 

\P(X t e -)-ir\\ >l-a. 

PROOF. As in the proof of Theorem 2.1, E$(X t ) = (1 - ^) t e lW ^(s ) and 
E$(Xoo) = 0. Put, for t = 0,1,2,3,..., 

%(X t ) = e- il **(X t ). 

Then for every t, E* t (X t ) = (1 - 7)**(a ) and E* t (X CX3 ) = and 

12 > maxE[|^ +1 (X t+1 ) - V t (X t )f\X t = s]. 

Now completely mimic the proof of Theorem 2.1 with ^t{Xt) playing the 
role of &(Xt). Doing so yields the desired result. □ 

Recall that in the general GR shuffle setting, the card moved to the top 
is chosen from position k with probability p k , k E [n] . This entails that the 
motion of a single card, c, is in itself a Markov chain: Given that c is in 
position k at time t, then at time t + 1, c is in position 1 with probability 
pi, in position k with probability m k := J2j<kPj an d m position k + 1 with 
probability M k := J2j >k Pj- Putting A for an eigenvalue of this chain and x 
for a corresponding eigenvector, these must satisfy the equations 

Axfc = PkXi + m k x k + M k x k+ i , 

k = 1,2, ...,n. Unfortunately this system of equations is very difficult to 
solve in general, so from now on we focus on some special cases. However, 
even so the exact eigenvalues cannot be expressed on closed form and we 
will consequently only be able to produce expressions that are very close to 
the eigenvalue. When arguing that a given expression is indeed close to an 
eigenvalue, the following lemma is very useful. The lemma should be more 
or less well known, but we have not been able to find any specific reference, 
so a proof is supplied. 

Lemma 4.2. Let D be the closed unit disc in the complex plane. Assume 
that /:D^C is analytic, /(0) =0 and \ f'(z)\ > 1 for every z in D. Then 
there exists a point zq G B such that f(zo) = 1. 

Proof. Let V := /(B) and let u be the leftmost point of the positive 
real axis that intersects the boundary of V; formally, 

u := inf{a G [0, oo) : a ^ V}. 

We want to prove that / -1 [0,-u] contains a smooth path from to 9B. The 
set / _1 [0,u] may not be connected, but put p for the component of / _1 [0, u] 



12 



J. JONASSON 



that contains the origin. We claim that p is a path of the desired type. 
By the assumption on /', the set p can be covered by open balls on which 
the restriction of / is an open bijective map with analytic inverse. Since 
p is compact, this cover can be taken to be finite; put Ui,U2,---,U n for 
the covering balls. Since f\jjj has a well-defined analytic inverse, pn Uj is 
a smooth path segment for every j. Putting the Uj's together shows that p 
is a smooth path and since p is closed, it contains its endpoints. Finally, to 
prove that the endpoint, w / 0, of p is on dD, assume for contradiction that 
w6D°. Then w is the center of an open ball, O, contained in D° on which / 
has analytic inverse. Since f(0) C V°, f(O) is crossed by [0,it] and, hence, 
O is crossed by a segment of p, contradicting that w is the endpoint. 

Next we claim that f\ p is a bijection onto [0, it]: If this was not the case, 
then by continuity, for some j, f\ p nUj would not be bijective. (We may regard 
f\ p as a continuous real- valued function defined on a closed interval of M.) 
This contradicts to the bijectivity of f\u - 

Thus, p can be parameterized the natural way, that is, by letting p(s) = 
/ (s), s G [0, it]. For an arbitrary z G p, write z = p(s), s G [0, it], and get, 
by Cauchy's theorem, 



Hence, f'{p{v))p' \v) = 1 for almost every v. Whence, by hypothesis, 



However, the right-hand side is the length of p and since p goes from the 
origin to the boundary of D, its length is at least 1. Thus, u > 1, that is, 
/(B) contains 1, as desired. □ 

A slight strengthening of the proof of Lemma 4.2 proves that, in fact, 
/(ID) DD: Replace the positive real axis [0, oo) with any ray e* e [0,oo), 9 G 
[0, 2n) and redefine u as u := inf{a G [0, oo) :ae lS ^ V}. Then, repeating the 
proof with only a small adjustment for the e^-factor again gives u > 1. Since 
6 was arbitrary, we have shown the following: 

Theorem 4.3. Let /:D^C be analytic with /(0) = and \f'(z)\ > 1 
for every z G C. Then /(D) D B. 

The way in which Lemma 4.2 will be used is the following: Suppose that 
for some zq we have that f(zo) = 5. Suppose also that |/'(^)| > M for all 
z within distance 5/M of zq. Then by Lemma 4.2 applied to the map z — > 
1 — /(zq + 5/M), f has a zero within distance 5/M of zq. 
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4.1.1. The bottom-to-top shuffle. Recall that now the card taken to the 
top of the deck is chosen uniformly among the bottom k cards. As above, put 
A and x for an eigenvalue and a corresponding eigenvector for single-card 
motion and assume without loss of generality that x\ = 1. Then the above 
system of equations for (A, x) becomes 

A = 22, 

Xx 2 = x 3 , 
Xx 3 = 24, 



^■En—k •^n—k+li 

1 k — 1 

AX n -k+l = -r H 7 — X n -k+2, 

_ 1 1 k—2 

AX n -k+2 — 7 + T x n-k+2 H 7 — 2 n _fc + 3, 

_ 1 2 /c-3 

A2 n _fc_|_3 — — + — X n —k+3 H — X n _fc+4, 



1 fc-2 1 

^2 n _i — — H — 2 n _l + —X n , 
1 fe-1 

Working backward, we get from the last equation that 

1 



fcA-(fc-l) 

The second to last equation gives 

l/k + (l/k)x n _ 1 
x "-! " A-(fc-2)/fc ~ fcA-(fc-l) ~ Xn 

or A = (k — 2)/k. Unless A = (k — 2)/k, the third to last equation now tells 
us that 

1/k + (2/k)x n -i 1 

X n -2 ~ 



X-(k-3)/k kX-(k-l) 
or A = (k — 3)/k. By induction, 



14 



J. JONASSON 



or A £ {0, 1/fe, 2/fc, . . . , (k — 2)/k}. By the first n — k equations, x\ = 1, 
X2 = A, x 3 = A 2 , . . .,x n _ fc+ i = A n_fc . Unless A € {0, l/k,2/k, . . . , (k - 2)/k}, 
the two expressions for x n _fc_|_i must be equal. This gives the following char- 
acteristic equation for A: 

g(X) := A"~ fc+1 - tllx n - k - \ = 0. 

k k 

Now assume that k = o(n). Then no eigenvalue less than 1 — 2/k will pro- 
vide a good lower bound so we can focus on A's solving the characteristic 
equation. So how do we find solutions to this equation? It is fairly easy to 
guess that there may be a solution close to A = e tw , where w = 27r/ra, so let 
us first simply put A = (1 — j)e iw , insert this into the equation and make an 
approximate calculation of what 7 then should be. We have, if 7 is assumed 
to be very small, 

5 (A) « (1 - n^e-^-V™ - ^(1 - ni )e~ lkw - \. 

The imaginary part of this is very small. The real part is approximately 

J (k-1) 2 9 \ k-l, J k 2 w 2 \ 1 

(1-717) i-^r —w 2 --^(l-n 7 ) 1 



2 J k "\ 2 J k 

1 + f*£zH_(*zi)V- 



k 1 V 

To make the last expression vanish, we must put 

k\w 2 



7= o — 



n 



We thus guess that g(\) has a zero very close to Ao := (1 — i^)w 2 /n)e tw . We 
now need to prove that this is indeed the case. To make things a bit easier to 
handle, transform the characteristic equation by putting z = \e~ lw , thereby 
getting 

f(z) := z n ~ k+1 - ^_± e -i™ z n-k _ 1 = Qj 

for which we want to prove there is a root very close to zq := 1 — fyw 2 /n. 
More precisely, we will use Lemma 4.2 to show that the distance from zq to 
the nearest root is 0(/c 3 n~ 4 ) = o{k 2 w 2 /n). Whence, there is a root of the 
form 1 — (1 + o{l))(^)w 2 /n. First we calculate /': 

f'(z) = ((n-k+ l)z - i n ~ k )^- l ) e -^ . 
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Next we evaluate f'(z) for an arbitrary point z such that \z — zq\ < fyw 2 / {In). 
Such a point can be written as 1 — c^) w 2 /n, where |c| G (1/2, 3/2). We have 



/ / («) = (l-o(l))((n-fc + l)(l-c(*)^ 



(n fc)( * H(l-0( n - 2 ))+f(- U ; + 0(n- 3 )) 



k 



(1 - 0(1)) + _ 0{k 2 n ^ + 0(n -l } 



.,' (n - k)(k - 1) _ . _ 9 . 
+ »(- g *-w + 0(n 2 ) 



Thus, /'(z) = (1 + o(l))n/k for all z within distance ( 2 )u> 2 /(2n) of zo- If it 
can now be shown that /(zo) = 0(k 2 n~ 3 ), it will follow from Lemma 4.2 
that / has a zero within distance 0(/c 3 n -4 ) of zo, as desired. However, 

1 n _fc+l/fc\ 2 fc-lA n-k(k\ 2 \ A w 2 



(fc-l)(n-fc)-fc(ra- fc + 1) / jfc 



w 2 



(fc-l)«; 2 (fc-1) 2 ^; 2 ^. 72 _ 3x 

and some algebraic manipulation reveals that all terms but the 0(/c 2 n~ 3 )- 
term at the end cancel. 

We have thus established for the bottom-to-top shuffle an eigenvalue of 

the form A = (1 - 7 )e ie , where 7 = (1 + o(l)) ^^ and 6 = (1 + o(l)) 2 ^ 
and a corresponding eigenvector given by 

x = [1, A, A 2 , . . . , A n " fc , A n " fc , . . . , A n " fe ] T - 

Now let Z\ denote the position of card j at time t, put m= \n/2\, 
&(X t ) = x z j and 



3=1 
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By linearity of expectation and what we just showed, E[<3?(Xj + i)|Xj] = 
\$>(Xt). To apply Theorem 4.1 with $ as test function, we need to check 
max se sE[|e"* 6 ' x $(X t +i) - <f>(X t )\ 2 \X t = s]. However, deterministically, all 
the n — k top cards move one step down the deck at each shuffle, and for 
such a card, j, on position r, 

\ e - ie &(X t+1 ) - &(X t )\ = \e- ie X r+1 - X r \ < 7 = 0(k 2 n~ 3 ). 

For the card that is moved to the top, & changes from \ n ~ k to 1, a change 
of order 0(fcn _1 ). Finally, for k — 1 of the cards at the bottom of the deck, 
their corresponding $ J 's stay put at X n ~ h , so for these cards, 

\e- ie &(X t+1 ) - ^(X t )\ = (1 + o(l))|l - e iw \ = 0{n- 1 ). 

Using the triangle inequality, 

\ e - ie <f>(X t+ i) - $(X t )\ < mO(k 2 n~ 3 ) + 0{kn~ l ) + kO^ 1 ) = 0{kn^) 

and, consequently, 

maxE[|e- ie $(X m ) - <Z>(X t )\ 2 \X t = s] = 0{k 2 n~ 2 ). 

We may thus apply Theorem 4.1 with R = 0(k 2 n~ 2 ). Observing that 3>(so) = 
O(n) if we start with the cards in order (this is why we use only half the 
cards when defining $), and putting a = 1/2, we then get the following lower 
bound when k = o(n): 

T =( 1+ » (1 »«tl)( b6{ ('«»-^T) 

. . , , n 3 / 1 k 2 n~ 2 

= (1 + ° (1)) 2vr^-l) I ° g " " 2 bg ^ 

= (1 + ° (1)) 4^fc(fe-l) lQgn - 
Now what about the case k = 0(n)? In this case it is fairly easy to find a 
lower bound of order n logn, just as one would expect from the k = o(n) lower 
bound just given. This can be done either by using the 1 — 2/k eigenvalue 
and the corresponding eigenvector in Theorem 4.1 or by softer probabilistic 
reasoning as for the (inverse) random-to-top shuffle; after en logn steps card, 
n will still be very close to the bottom of the deck. However, an intriguing 
fact about these shuffles is that when k is really large, for example, k > 2n/3, 
then the second eigenvalue for the single card chain is farther away from 
1 than for the case k = n. This seems to suggest that the bottom-to-top 
shuffle with, say, k = 0.9n, could in fact be faster then the random-to-top 
shuffle. Unfortunately we have not been able to find any good evidence about 
whether this is indeed the case or not, and it would be quite interesting to 
know the answer to this problem. 

In summary, we have found the following result: 
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Theorem 4.4. A lower bound on the mixing time for the bottom-to-top 
shuffle where the card taken to the top at a given shuffle is chosen among 
the bottom k = o{n) cards is given by 



(1 + 0(1)) 4vr^-l) lQgn - 
When k = 0(n), the mixing time is Q(nlogra). 

Note that taking k = 2 in Theorem 4.4 recovers the previously known 
n 3 log n lower bound for the Rudvalis shuffle. 

4.1.2. GR shuffles with p n -k =Pn = 1/2. To avoid parity problems, we 
must assume that k is odd. Using the same notation as for the bottom-to-top 
shuffle above, the equations for the eigenvalue/eigenvector pairs now become 

xi = 1, 

Xxj =Xj+i, j = 1,2,. . . ,n - k - 1, 

Xxj = ^Xj + \xj + \, j = n — k + l,n — k + 2, ...,n — 1, 

XXn — 2 ~t~ 2^ n ' 

Solving forward using all but the last equation, we get 
x\ = 1, x 2 = A, . . . , x n _ k = A n_fc_1 , 



Xn-k+1 — 2A n — 1, 



x n _ fc+2 = (2A - l)(2A"- fc - 1), .. .,x n = (2A- l) fc - i (2A n " fc - 1). 

Invoking the last equation, we also have x n = 1/(2A — 1) and since the two 
expressions for x n must agree, we get the characteristic equation 

g{\) := (2A - l) fe (2A n - fc - 1) - 1 = 0. 

Again we make separate treatments of the cases k = o{n) and k = 0(n), 
beginning with the former. 

When k = o(n), we suspect that a useful eigenvalue can be found quite 
close to 1. Therefore, 2 A — 1 ~ A 2 , so it is a good idea to start with the 
equation one gets by making this replacement: 

/(A) := A n+fc - \\ 2k - \ = 0. 

Putting fi: = X k and n = ck, we get 



A = An := 1 
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Since c = S7(l), this equation resembles the characteristic equation for the 
bottom-to-top shuffle. It has a solution very close to 

and so a zero of /(A) can be found very close to 

k 2 w 2 ' 
2n 

Let us check how close to a solution An is: 

(n + k)k 2 w 2 ^. j4 ,.\/ k 2 w 2 

1-- - hOfc n ) 1 \-ikw + 0(k^n 3 ) 

2n / \ 2 

- -(1 + 0(fc 3 ra _3 ))(l - 2fc 2 w 2 + 2ifcu; + 0(A; 3 n- 3 )) - - 
^ k 2 w 2 - -k 2 w 2 + fcW + 0(k 3 n~ 3 ) = 0(A; 3 n- 3 ). 



2n 2 



Since 



/'(A) = (n + fc)A n+fc " 1 - fcA 2 ^ 1 = (1 + o(l))n 



in a large enough surrounding of An, Lemma 4.2 implies that / has a zero 
in a point Ai at distance 0(k 3 n~ 4 ) from An- Next we check how far off Ai 
is from being a zero of g. Compare A 2 and 2Ai — 1: 

2Ai - 1 - \\ = 2A - 1 - X 2 + 0(k 3 n~ 5 ) 

( k 2 w 2 \( w 2 „, 

-(l- + 0{k A n- G )^j (1 - 2w 2 + 2iw + 0(n" 3 )) 

+ 0(fc 3 ra~ 5 ) 
= w 2 + o(n~ 2 ). 

Thus, 

(2Ai - l) fc = Af + kw 2 + o{kn~ 2 ). 
Therefore, since Ai is a zero of /, 

1 + g (\ 1 ) = (Af + kw 2 + o(fcn- 2 ))(2Ar fc " 1) 

= 1 + 2/(Ai) + (1 + o(l))(kw 2 + o{kn- 2 )) 
= l + kw 2 + o(kn~ 2 ), 
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so 5 (Ai) = (l + o(l))kw 2 . However, g'(X) = (l + o(l))2n in a sufficiently large 
surrounding of Ai to allow us to appeal to Lemma 4.2 for the conclusion that 
g has a zero in a point A2 at distance (1 + o(l))kw 2 / (2n) from Ai. By the 
triangle inequality, 

|A 2 -A | = (l + o(l))— + 0(k 3 n- 4 ) 
and so A2, the eigenvalue we set out looking for, can be written as 

A = (l- 7 )e-«, 

where 

2n 2 k(k- 1) 2vr 2 A;(A: + l) 



:i + o(l)) ^ '- < 7 < (1 + o(l)) 



and = (1 + (1))£. 

Next we apply Theorem 4.1 with the eigenvalue and corresponding eigen- 
vector, x, we have just found. For card j, put <& 3 {X t ) = x z j, m = [n/2\ 
and 

m 

j'=i 

To bound A := |e _l6 3>(.Xt+i) — observe that the top n — A; cards, just 

as for the bottom-to-top shuffle, deterministically move one step down the 
deck. Thus, each of them contribute (1 + o(l))7 = 0{k 2 n~ ?J ) to A, so by the 
triangle inequality, they cannot together contribute more than 0(k 2 n~ 2 ). 
Of the k cards between position n — k and position n, k — 1 cards will move 
one step up the deck or stay put and will thereby individually contribute 
0(n~ l ) and together at most 0(kn~ 1 ) to A. Finally, one card will move 
from position one of the bottom k positions to the top, thereby contributing 
0{kn~ l ) to A. Summing up, we get that A is bounded by 0(kn~ 1 ) and 
Theorem 4.1 may be applied with R = 0(k 2 n~ 2 ). Putting a = 1/2 and noting 
that 3>(so) = 0(n) if so has the cards in order, we get almost the same lower 
bound that we got for the bottom-to-top shuffles: 

Theorem 4.5. Consider the GR shuffle where p n -k =p n = 1/2 and k 
is odd. If k = o(n), then a lower bound on the mixing time is given by 

(1 + ° (1)) 4^fc ( fe + l) lQgn - 



Again note that we retrieve the previously known lower bound for the 
Rudvalis shuffle, this time by putting k = 1. 
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The case k = Q(n) is again a little more difficult since it is harder to tell 
in general as exactly as for the case k = o(n) where the second eigenvalue 
is to be found. However, it is fairly easy to show that there is an eigenvalue 
for which one has 7 = G(?i _1 ) and 8 = 0(n~ l ) and that there are no other 
nontrivial eigenvalues with 7 of lower order than this. Applying Theorem 4.1 
with the corresponding eigenvector then gives a lower bound of O(ralogra), as 
expected from the lower bound for k = o(n). With some care, one can come 
up with more explicit bounds if k is specified, for example, when k = n/2, 
the same methods as those used above yield an eigenvalue (1 — j)e ld , where 
7 = (1 + o(l))-^!p and 9 = (1 + o(l))|u>. Applying Theorem 4.1 then gives 
the lower bound 

r = (l + o(l))^^nlogn. 

However, this is, at least not in general, the correct order of the mixing. 
To show this, let us give the case k = n/2 some extra consideration: Let Zt 
be the position of a specific card at time t (counting 0, . . . , n — 1 for the 
positions rather than 1, . . . , n). Put 

Ut '■= Zt + (Zt — k) + — imodfc 

and V t = U t - U t -\ mod A;. Then E[Vt|A t _i] = and Var(T4|X t _i) is 1 when 
U%—\ > k and when Ut-i < k. Thus, 

Mar U t =E[E[£t 2 |AVi]] = E[£/ t 2 _i + E[V?\X t _x]] < VaxU t -i + 1, 

so by induction, VarC/t < t. However, then, by Chebyshev's inequality, 

P(\U w -6 n 2 -U - l(rV| > O.Oln) < 0.01, 

while at stationarity a deviation like this would have a probability of more 
than 0.9. We have shown the following: 

Theorem 4.6. Suppose that n is even. Then the mixing time of the GR 
shuffle with p n = p n / 2 = 1/2 is 0(n 2 ). 

The proof of Theorem 4.6 can be made to work when k = cn for any 
rational constant c and then gives a lower bound of order n 2 . However, we 
have not been able to turn this f2(n 2 ) lower bound for single cards into an 
f2(ro 2 logn) bound for the whole deck, mainly because of the strong depen- 
dence between the motions of different cards. When c is irrational, things 
are more unclear; it is not hard to see that in this case the mixing time for 
single cards is in fact O(nlogn), but it seems hard to imagine that the whole 
deck would mix much faster for irrational values of c. 

What the true order of mixing is for p n =p n _k = 1/2 seems to be a wide 
open question. One natural guess is that the mixing time is @(n 3 logn) for all 
k and either a proof or a counterexample to this would be very interesting. 
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4.2. Upper bounds. As already pointed out, we will only be able to pro- 
vide a good upper bound for the bottom-to-top shuffle. Thus, we are again 
considering the situation where at each time step the card moved to the 
top of the deck is chosen uniformly at random from the k bottom cards. 
As usual, denote the state of the deck at time t by X t , and let Xq be any 
fixed state. (Since the bottom-to-top shuffle describes a simple random walk 
on S n , the stationary distribution is uniform and the particular starting 
state does not affect the convergence rate.) For each t = 0, 1, 2, 3, . . . , let the 
mapping (3 t :S n ^ S n be given by 

p t (a) = (n-l n-2...1 Of oa 

(where the positions of the permutations are for convenience now denoted 
0, 1, 2, . . . , n — 1). Put Yt = (3t{Xt). In words, Yt is Xt with position k + t 
(modulo n) regarded as position k, k = 0, 1, . . . , n — 1. Clearly, 

\\P(Y t e-)-n\\ = \\P(X t e-)-7r\\, 

so we may, and will, work with Yt instead of Xt itself. The process {Yt} 
describes the (time-inhomogenous) Markov chain one gets by at time t, 
making a random-to-bottom shuffle modulo n on the subset of positions 
At := {n — k + 1 — t,n — k + 2 — t, . . . ,n — t} modulo n. (The set At corresponds 
to the bottom k positions for {Xt}.) 

We will couple another process {Y/} with the same updating mechanism, 
but started from stationarity, with {Yt}. The coupling rule is the following: 
For all cards, c, such that Y t (c) G A t and Y{(c) G A t , move c to position n — t 
in Y^ if and only if it is moved also in Yj. If the card moved in Yt is not in 
an ^-position in Y/, then pick for Y{ a card chosen uniformly from those 
cards that are in an ^-position in Y{ but not in If. Clearly, {Y(} has the 
correct updating distribution and the coupling has the following important 
properties: 

• A card c in {Yt} cannot pass its copy in {Y(} unless Yt{c) and lf'(c) are 
both in At- 

• Once a card c has been in an ylf-position in Yt and Y( simultaneously, it 
will be matched as soon as it leaves At- This is because the only way a 
card can leave At is by being picked as the card moved to the bottom of 
At and by the nature of the coupling, this happens simultaneously for the 
two decks. 

Consequently, let ?o(c) be the first time that card c has been in At simul- 
taneously for the two decks: 

T (c) := min{* : Y t (c) G A t and Y{{c) G A t } 

and let To = max c To(c). Then at time To all cards outside At must be 
matched. However, then, by the nature of the shuffle and the coupling, all 
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cards will be matched as soon as all cards in At have left At. Putting T 
for the first time this has happened, the coupling inequality tells us that 

\\P(Y t e.)-*\\<P(T>t), 

so for the rest of this section, we focus on estimating T. By the standard 
coupon collector's problem, for any a > 0, P(T — Tq > (1 + a)k log k) = o(l) 
unless k = 0(1), in which case P(T — To > f n ) = o(l) as soon as f n = 0(1). 
Now what about To? 

Consider the motion of a single card c in {Yt}. Denote the time interval 
between two successive occasions when c leaves At, by a cycle for c. A 
moment's thought reveals that during a cycle c moves k — G steps down the 
deck (modulo n), where G is a random variable with geometric distribution 
with parameter 1/k. Thus, putting Tj = Tj(c) for the jth time, j = 1, 2, 3, . . . , 
that c leaves A t , the process {y Tj (c)}°^ =1 describes a random walk on 7L n with 
step size distribution k — G. Keeping track of how c crosses the top/bottom 
border of the deck, we get a random walk on Z with step size distribution k — 
G; in particular, the step size mean is and the step size variance is k 2 (l — 
1/k) = k(k — 1). Now considering c's motion in {Y(} gives a corresponding 
sequence {rj} of stopping times and a corresponding random walk with the 
same properties. Let J = min{j : r- > T (c)}. Then if Y (c) > Yq(c), 

71 < r( < r 2 < 72 < • • • < tj = t'j < rj+i = t' j+1 < ■■ 

or 

71 < r{ < 72 < 72 < • • • < t'j_ X =Tj < Tj = T J+1 < ■ • , 

depending on if To(c) coincides with c entering ^4 t in Y( or 1^. If Yq(c) < 
Yq(c), then these relations with the primed and nonprimed quantities inter- 
changed hold. In the sequel we assume that Yq(c) > Yq(c); the other case is 
treated analogously. 

Put Sj = Y Tj (c) — Y' r , (c). Then {Sj} for the first J — 1 steps behaves like 

the difference between two independent random walks on Z n . Thus, {Sj} 
itself for the first J — 1 steps describes a symmetric random walk on Z with 
step size variance 2k(k — 1). To make this exact, define a third deck {Y"} 
such that Y" coincides with Y{ for t < To(c), but let Y" evolve independently 
of Yt from time Tq(c) and on. Associate in analogy with the above with Y" 
the stopping times t" and put Uj =Y Tj (c) —Y"„(c). Then {Uj} describes a 

random walk, started from somewhere between and n, that for all time 
is the difference between two random walks of the kind encountered above, 
and Uj coincides with Sj for the first J — 1 steps. The process {Uj} also has 
the property that if it, after j steps, has at least once passed the origin or 
vertex n, then j > J; this follows from the above properties of the coupling. 
Thus, Tq(c) is bounded by the first time Uj passes outside the interval [0, n]. 
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Let us now bound the probability that {Uj} has not passed outside [0, n] 
in, say, jo steps. This probability is maximized when Uq = rt/2, so let us 
assume that this is the case. Put 



2 



' 3 ' y/2k(k-l) V ^ 

so that Wq = 0, {Wj} has step size variance 1 and Uj passes outside [1, n — 1] 
when Wj passes outside 



n 



n 



2 3 / 2 y/k{k-iy 2 3 / 2 ^k(k-l)_ 

From here we must treat the cases k = o(n) and k = O(n) separately. Assume 
first that k = o(n). Let 

n 



M:-- 



2 3 / 2 ^/k(k-l)' 

By Donsker's theorem (since M — > oo) (see, e.g., [3], Section 7.6) 



M 



D 



sG[0,oo) 



{-^slselO.oo) 



as n — > oo, where {B s } stands for standard Brownian motion. Thus, 
1 



P Vs<s : 



M 



<1) =(l + o(l))P(Vs<s :|B s |<l) 
< (l + o(l))-e-" 2s °/ 8 , 

7T 



where the last bound can be found, for example, in [3], Section 7.8. With 
so = (1 + o(l))(8/vr 2 ) logn, the right-hand side is o(n _1 ). Translating this 
back to {Uj} tells us that the probability that Uj has not passed outside 
[0,n] after (1 + o(l))(8/vr 2 )M 2 logn steps is c^n" 1 ). Thus, with 



j := (l + o(l)) 



7r 



TT 2 k(k - 1) 



logn, 



we have 



P(J>j ) = o(n- 1 ). 



Now how long does it take before c has gone through jo cycles? This time, 
say, Ti, can be written as 



Jo 



where r/ is the time taken until c leaves At for the first time, and the £j's 
are the jo independent cycle times. Since rj represents the time taken for 



24 



J. JONASSON 



a partial cycle, rj is stochastically dominated by a random variable £q with 
the same distribution as £1, . . . , £ J0 and so T\ is dominated by T2 := X/J'LoCr 
The £j's have distribution n — k + G, where G is geometric with parameter 
1/k. Hence, ET2 = n(jo + 1) and we want to bound P(T2 > (1 + a)n(jo + 
1)) for an arbitrary small a > 0. However, this probability coincides with 
the probability that B < jo, where B is binomial with parameters (an + 
k)(jo + l), and 1/k. Since jo = fi(logn) and k = o(n), it follows from standard 
Chernoff bounds that 

P{T 2 >{l + a)nj ) = o(n- 1 ). 

Thus, P{i~j > (l + a)njo) = o(n~ 1 ) and since it was shown above that P(J > 
jo) = o(n~ 1 ) and since To(c) < t'j, we get that 

P(T (c)>(l + a)nj )=o(n- 1 ). 
Summing over the cards, 

P(T >(l + a)nj ) = o(l). 

Finally, since k = o(n) , k log k is of smaller order than njo [or when k = 0(1), 
then /„ is of smaller order than njo] , and so 

P(T>(l+a)nj )=o(l). 

We have thus arrived at the following upper bound on the mixing time: 

Tmix - (1 + o(1)) ^M^T) logn - 

In the case k = 0(n), the above approach does not work exactly as it 
stands for two reasons: 

• Since n and k are of the same order, the Brownian motion approximation 
does not work. 

• The k log k term for T — To is of the same order as njo ■ 

However, it is easy to modify the arguments slightly to give an upper bound 
of the type Cn log n for some constant C. A more detailed analysis would 
also give an estimate for C, but since these estimates appear to be well 
above what one would guess are the correct figures (e.g., when k is close to 
n, an estimate of C lands well above 1), we omit that here. 

The following theorem summarizes our results on the bottom-to-top shuf- 
fle: 

Theorem 4.7. Let 

T mix be the mixing time for the bottom-to-top shuffle 
where the card taken to the top at each shuffle is chosen among the bottom 
k cards. Then ifk = o(n), 

n n 
(1 + ^k(k - 1) l0gW < Tmix <{1 + **k(k - 1) lQgn ' 
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If k = Q(n), then r m ; x = 0(n log n). 
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